Geodata Assignment

1 Introduction

The two species I have chosen to look at are the Fisher (Pekania pennanti) and the North American Porcupine (Erethizon dorsatum). Although adult porcupines are generally at very low risk of predation, they are susceptible to a small number of specialist predators, which includes Fishers. Foraging and movement patterns of porcupines are known to be strongly affected by predation risk even when under high levels of nutritional stress (Pokallus & Pauli, 2015). Porcupines and Fishers have been shown to exhibit predator prey dynamics with the presence of Fishers causing declines in porcupine populations in certain areas of range overlap (Earle & Kramm, 1982). Therefore it is possible that porcupines will try to avoid areas where Fishers are present. However as porcupines are believed to be the preferred prey of Fishers (Earle & Kramm, 1982) it is also possible that Fisher presence is dependent upon the presence of Porcupines.

Species 1 = Fisher (P. pennanti)

Species 2 = North American Porcupine (E. dorsatum)

2 Set Up

2.1 Packages used:

library(here)
library(dismo)
library(rworldmap)
library(sf)
library(geodata)
library(tibble)
library(terra)

2.2 Load in a World Map:

world_map <- getMap(resolution = "coarse")

2.3 Creating Species Coordinate Datasets:

To create the datasets first the the species data is downloaded from the GBIF data network, https://www.gbif.org/.

This includes a lot of information not required for species distribution modelling. Therefore latitude and longitude data must be extracted from the larger dataset. This was done using the coordinate extraction function shown below.

Coordinate Extration Function
#This code extracts just the longitude and latitude (coordinate data) values from the GBIF data.

extract_coordinates <- function(speciesdata.gbif) {
    cbind(speciesdata.gbif$lon, speciesdata.gbif$lat) %>% #Combines the long atnd lat data to create coordinates
    na.omit() %>%  
    data.frame() #converts the list into a dataframe object
}

Additionally only coordinates within continental landmasses should be included. Therefore any points that intersected with ocean coordinates were removed. The ocean coordinates were taken from the natural earth dataset, https://naturalearth.s3.amazonaws.com/110m_physical/ne_110m_ocean.zip.

Spherical geometry (s2) switched off
although coordinates are longitude/latitude, st_intersects assumes that they
are planar
although coordinates are longitude/latitude, st_intersects assumes that they
are planar

2.4 Loading and Extracting Climate Variables:

To create a model predicting species distribution based on climate variables you also need a dataset of various climate measurements from the area covered by the species dataset coordinates. Climate data was downloaded from the worldclim database, https://www.worldclim.org/.

The climate variables specific to the range of the chosen species are then extracted from the wider dataset and combined with the coordinate data to create the final processed datasets on which our models will be based.

3 Distribution Modelling

3.1 Species 1

The data downloaded from GBIF only contains data on presence, absence was not recorded. Absence data is required to run the models therefore pseudoabsence data was created by randomly generating 500 points within the study area of each species, using the code below.

Generation of Pseudoabsence Data
set.seed(123) #seed used so that the randomly generated absence points are the same each time and therefore analysis can be consistent each time.

#Generate the pseudoabsence points. 
species1_bg <- randomPoints(species1_extent_mask, 500, ext=species1_extent) 
colnames(species1_bg) <- c("lon", "lat") 

The ability of different climate variables to predict the presence absence data was then compared by creating a set of generalised linear models (GLMs). The models assume binomial errors as the presence absence data is binary data. In total five models were created each containing 3-4 climate variables, as this number of variables per model which was small enough to prevent issues of multiple colinearity, which cause singularities, and overfitting.

Model Summaries:

Model containing variables 1, 2, 3, and 4.


Call:
glm(formula = pa ~ bio1 + bio2 + bio3 + bio4, family = binomial(link = "logit"), 
    data = species1_env_pa_data)

Coefficients:
              Estimate Std. Error z value Pr(>|z|)    
(Intercept) 21.5530421  1.3634661  15.808  < 2e-16 ***
bio1         0.1624200  0.0198623   8.177 2.90e-16 ***
bio2         0.5047900  0.0736122   6.857 7.01e-12 ***
bio3        -0.4860906  0.0332938 -14.600  < 2e-16 ***
bio4        -0.0109573  0.0009384 -11.677  < 2e-16 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

(Dispersion parameter for binomial family taken to be 1)

    Null deviance: 3494.9  on 6311  degrees of freedom
Residual deviance: 2943.6  on 6307  degrees of freedom
AIC: 2953.6

Number of Fisher Scoring iterations: 6

Model containing climate variables 5, 6, 7, and 8.


Call:
glm(formula = pa ~ bio5 + bio6 + bio7 + bio8, family = binomial(link = "logit"), 
    data = species1_env_pa_data)

Coefficients:
              Estimate Std. Error z value Pr(>|z|)    
(Intercept)  2.355e+00  7.016e-01   3.357 0.000788 ***
bio5        -4.992e+04  3.923e+04  -1.272 0.203245    
bio6         4.992e+04  3.923e+04   1.272 0.203244    
bio7         4.992e+04  3.923e+04   1.272 0.203245    
bio8         1.662e-02  8.824e-03   1.883 0.059644 .  
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

(Dispersion parameter for binomial family taken to be 1)

    Null deviance: 3494.9  on 6311  degrees of freedom
Residual deviance: 3372.7  on 6307  degrees of freedom
AIC: 3382.7

Number of Fisher Scoring iterations: 5

Model containing climate variables 9, 10, 11, and 12.


Call:
glm(formula = pa ~ bio9 + bio10 + bio11 + bio12, family = binomial(link = "logit"), 
    data = species1_env_pa_data)

Coefficients:
              Estimate Std. Error z value Pr(>|z|)    
(Intercept) -2.0444301  0.5275894  -3.875 0.000107 ***
bio9        -0.0763295  0.0092896  -8.217  < 2e-16 ***
bio10        0.0332500  0.0240264   1.384 0.166391    
bio11        0.0675660  0.0182837   3.695 0.000220 ***
bio12        0.0050917  0.0002272  22.412  < 2e-16 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

(Dispersion parameter for binomial family taken to be 1)

    Null deviance: 3494.9  on 6311  degrees of freedom
Residual deviance: 2721.0  on 6307  degrees of freedom
AIC: 2731

Number of Fisher Scoring iterations: 6

Model containing climate variables 13, 14, 15, and 16.


Call:
glm(formula = pa ~ bio13 + bio14 + bio15 + bio16, family = binomial(link = "logit"), 
    data = species1_env_pa_data)

Coefficients:
             Estimate Std. Error z value Pr(>|z|)    
(Intercept)  3.430714   0.397551   8.630  < 2e-16 ***
bio13       -0.076640   0.014358  -5.338 9.42e-08 ***
bio14       -0.033002   0.006663  -4.953 7.31e-07 ***
bio15       -0.083754   0.007425 -11.280  < 2e-16 ***
bio16        0.038476   0.005015   7.672 1.70e-14 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

(Dispersion parameter for binomial family taken to be 1)

    Null deviance: 3494.9  on 6311  degrees of freedom
Residual deviance: 2634.3  on 6307  degrees of freedom
AIC: 2644.3

Number of Fisher Scoring iterations: 6

Model containing climate variables 17, 18, and 19.


Call:
glm(formula = pa ~ bio17 + bio18 + bio19, family = binomial(link = "logit"), 
    data = species1_env_pa_data)

Coefficients:
              Estimate Std. Error z value Pr(>|z|)    
(Intercept) -2.2916219  0.2191538 -10.457  < 2e-16 ***
bio17        0.0056914  0.0013397   4.248 2.16e-05 ***
bio18        0.0126564  0.0010758  11.765  < 2e-16 ***
bio19        0.0052539  0.0006468   8.122 4.57e-16 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

(Dispersion parameter for binomial family taken to be 1)

    Null deviance: 3494.9  on 6311  degrees of freedom
Residual deviance: 2753.0  on 6308  degrees of freedom
AIC: 2761

Number of Fisher Scoring iterations: 6

Figure 1: Tables showing summary statistics for each GLM created to model the effects of different climate variables on the distribution of species 1.

All models were able to predict the distribution of Fishers to a certain extent, however variable 10 as well as all of the variables in model 2 do not appear to be as important for predicting distribution as the other climate variables.

To compare how well each model predicts the presence/absence of fishers an Akaike Information Criterion test was performed.

              df      AIC
species1_glm1  5 2953.616
species1_glm2  5 3382.732
species1_glm3  5 2730.988
species1_glm4  5 2644.323
species1_glm5  4 2760.966

Figure 2: Table comparing the AIC values of the species 1 GLMs.

Model 4 has the lowest AIC value and is therefore the best model for predicting fisher distribution, out of those created in this analysis and based on this set of 19 climate variables. This suggests that precipitation of the wettest and driest months, precipitation seasonality and precipitation of the wettest quarter are all very important in determining fisher distribution. The fact that model 4, which contains only variables related to precipitation, was the best model may suggest that precipitation has a stronger effect on fisher distribution than temperature. This aligns with previous evidence that climate change induced droughts have been negatively affecting fisher populations (Kardosky et. al., 2012). However temperature does still appear to have some effect on Fisher distribution.

Evaluating AUC and correlation of models:

The ability of a models predictions to match the observed data can also be evaluated by looking at the area under the curve (AUC) and correlation (cor) values.

Model containing variables 1, 2, 3, and 4.

class          : ModelEvaluation 
n presences    : 5812 
n absences     : 500 
AUC            : 0.7382369 
cor            : 0.3412544 
max TPR+TNR at : 2.21038 

Model containing climate variables 5, 6, 7, and 8.

class          : ModelEvaluation 
n presences    : 5812 
n absences     : 500 
AUC            : 0.5742318 
cor            : 0.1448049 
max TPR+TNR at : 2.188867 

Model containing climate variables 9, 10, 11, and 12.

class          : ModelEvaluation 
n presences    : 5812 
n absences     : 500 
AUC            : 0.7585594 
cor            : 0.3741869 
max TPR+TNR at : 2.223178 

Model containing climate variables 13, 14, 15, and 16.

class          : ModelEvaluation 
n presences    : 5812 
n absences     : 500 
AUC            : 0.7912066 
cor            : 0.4386138 
max TPR+TNR at : 2.407037 

Model containing climate variables 17, 18, and 19.

class          : ModelEvaluation 
n presences    : 5812 
n absences     : 500 
AUC            : 0.7538322 
cor            : 0.3692391 
max TPR+TNR at : 2.157795 

Figure 3: Tables showing the AUC and cor values of GLM of species 1.

The results of evaluating the models using AUC and cor values matched those done by comparing the model AIC values. Model 4 had both the highest AUC and the highest correlation. However its cor values was just below 0.45 suggesting that the correlation between model predictions and the observed distributions are of moderate strength at best. Potentially a model which combines data on precipitation and temperature might have provide more accurate predictions. This might also suggest that additional non-climatic factors are important in determining Fisher distribution. These could include biotic interactions, presence of humans, habitat type etc.

As model 4 is consistently shown to be the best model for predicting Fisher distribution (out of those created in this analysis) this is the one which will be used to map current distribution of Fishers.

Mapping current distribution:

Figure 4: Map showing predicted probabilities of species 1 occurrence based on predictions from model 4.

Figure 5: Map showing current predicted areas of presence and absence for species 1 based on model 4.

The distribution predicted by the model matches the observed data to a certain extent. The high probabilities of occurence found across the entire East side of North America and along the Western coast both line up with the two main clusters of Fisher presence shown by the GBIF data. However the model mostly misses the small group of fishers which inhabit central Canada. On the Eastern coast the area’s of predicted high probability also extend further South and North than observed data. As model four only includes information on precipitation it is possible adding temperature would restrict the latitudinal range predicted by the model. The current distribution map predicts that Fishers are split into two distinct populations, a larger one occupying most of Eastern Canada and the US and a smaller one running along the West Coast. While these are the two main clusters seen in observed data from GBIF the actual distribution of Fishers appears to run in a continuous band across the continent. The fact that the central band of Fishers is not represented in the current distribution is not unexpected as correlation between observed data and model four predictions was moderate therefore we would not expect maps created from model 4 to perfectly match the observed presence.

3.2 Species 2

Model Summaries and AIC:

Model containing variables 1, 2, 3, and 4.


Call:
glm(formula = pa ~ bio1 + bio2 + bio3 + bio4, family = binomial(link = "logit"), 
    data = species2_env_pa_data)

Coefficients:
             Estimate Std. Error z value Pr(>|z|)    
(Intercept) 19.354475   0.845817  22.883  < 2e-16 ***
bio1         0.092340   0.012472   7.404 1.32e-13 ***
bio2         0.653465   0.035380  18.470  < 2e-16 ***
bio3        -0.419107   0.016891 -24.813  < 2e-16 ***
bio4        -0.011098   0.000571 -19.435  < 2e-16 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

(Dispersion parameter for binomial family taken to be 1)

    Null deviance: 4569.2  on 18121  degrees of freedom
Residual deviance: 3813.2  on 18117  degrees of freedom
  (5 observations deleted due to missingness)
AIC: 3823.2

Number of Fisher Scoring iterations: 7

Model containing climate variables 5, 6, 7, and 8.


Call:
glm(formula = pa ~ bio5 + bio6 + bio7 + bio8, family = binomial(link = "logit"), 
    data = species2_env_pa_data)

Coefficients:
              Estimate Std. Error z value Pr(>|z|)    
(Intercept)  1.955e+00  4.326e-01   4.521 6.17e-06 ***
bio5         6.343e+04  3.928e+04   1.615    0.106    
bio6        -6.343e+04  3.928e+04  -1.615    0.106    
bio7        -6.343e+04  3.928e+04  -1.615    0.106    
bio8        -4.525e-02  8.538e-03  -5.300 1.16e-07 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

(Dispersion parameter for binomial family taken to be 1)

    Null deviance: 4569.2  on 18121  degrees of freedom
Residual deviance: 4378.1  on 18117  degrees of freedom
  (5 observations deleted due to missingness)
AIC: 4388.1

Number of Fisher Scoring iterations: 7

Model containing climate variables 9, 10, 11, and 12.


Call:
glm(formula = pa ~ bio9 + bio10 + bio11 + bio12, family = binomial(link = "logit"), 
    data = species2_env_pa_data)

Coefficients:
             Estimate Std. Error z value Pr(>|z|)    
(Intercept) 1.7298829  0.3927317   4.405 1.06e-05 ***
bio9        0.0020425  0.0074304   0.275 0.783408    
bio10       0.0581516  0.0173469   3.352 0.000801 ***
bio11       0.0387746  0.0146505   2.647 0.008130 ** 
bio12       0.0013511  0.0001409   9.591  < 2e-16 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

(Dispersion parameter for binomial family taken to be 1)

    Null deviance: 4569.2  on 18121  degrees of freedom
Residual deviance: 4318.6  on 18117  degrees of freedom
  (5 observations deleted due to missingness)
AIC: 4328.6

Number of Fisher Scoring iterations: 7

Model containing climate variables 13, 14, 15, and 16.


Call:
glm(formula = pa ~ bio13 + bio14 + bio15 + bio16, family = binomial(link = "logit"), 
    data = species2_env_pa_data)

Coefficients:
             Estimate Std. Error z value Pr(>|z|)    
(Intercept)  4.539140   0.274975  16.507  < 2e-16 ***
bio13        0.020119   0.007670   2.623  0.00872 ** 
bio14        0.008476   0.004540   1.867  0.06192 .  
bio15       -0.030231   0.004656  -6.493 8.43e-11 ***
bio16       -0.007782   0.002614  -2.977  0.00291 ** 
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

(Dispersion parameter for binomial family taken to be 1)

    Null deviance: 4569.2  on 18121  degrees of freedom
Residual deviance: 4241.1  on 18117  degrees of freedom
  (5 observations deleted due to missingness)
AIC: 4251.1

Number of Fisher Scoring iterations: 7

Model containing climate variables 17, 18, and 19.


Call:
glm(formula = pa ~ bio17 + bio18 + bio19, family = binomial(link = "logit"), 
    data = species2_env_pa_data)

Coefficients:
              Estimate Std. Error z value Pr(>|z|)    
(Intercept)  3.3560634  0.1253574  26.772  < 2e-16 ***
bio17        0.0158805  0.0009937  15.981  < 2e-16 ***
bio18       -0.0062160  0.0006553  -9.485  < 2e-16 ***
bio19       -0.0020855  0.0004343  -4.802 1.57e-06 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

(Dispersion parameter for binomial family taken to be 1)

    Null deviance: 4569.2  on 18121  degrees of freedom
Residual deviance: 4239.7  on 18118  degrees of freedom
  (5 observations deleted due to missingness)
AIC: 4247.7

Number of Fisher Scoring iterations: 7

Figure 6: Tables showing summary statistics for each GLM created to model the effects of different climate variables on the distribution of species 2.

For the North American Porcupine only three of the models are able to significantly predict occurrence, models 1, 4, and 5. Although the predictive power of model 4 may mostly rely on just one variable, variable 15 (precipitation seasonality).

To compare how well each model predicts the presence/absence of the North American Porcupine an Akaike Information Criterion test was performed.

              df      AIC
species2_glm1  5 3823.151
species2_glm2  5 4388.149
species2_glm3  5 4328.638
species2_glm4  5 4251.094
species2_glm5  4 4247.674

Figure 7: Table comparing the AIC values of the species 2 GLMs.

The model with the lowest AIC value is model 1, which indicates annual mean temperature, mean diurnal range, isothermality and temperature seasonality are all important factors in determining porcupine distribution. This is somewhat suprising as there is evidence suggesting that Porcupines are vulnerable to high precipiation, especially during the winter. High winter precipiation has been shown to greatly increase winter mortality of Porcupines, potentially due to their high susceptibility to pneumonia (Appel et. al. 2021). However as the other two models containing variables linked primarily to temperature had very limited predictive ability the role of precipitation cannot fully be discounted as both models 4 and 5 had lower AICS than models 2 and 3. In general all of the models for species 2 had higher AIC’s than the models for species 1 which suggests that climate is not as important in controlling the range of Porcupines as it is for Fishers.

Evaluating AUC and correlation of models:

The ability of a models predictions to match the observed data can also be evaluated by looking at the area under the curve (AUC) and correlation (cor) values.

Model containing variables 1, 2, 3, and 4.

class          : ModelEvaluation 
n presences    : 17623 
n absences     : 499 
AUC            : 0.7084946 
cor            : 0.2946213 
max TPR+TNR at : 3.07035 

Model containing climate variables 5, 6, 7, and 8.

class          : ModelEvaluation 
n presences    : 17623 
n absences     : 499 
AUC            : 0.6024606 
cor            : 0.1038597 
max TPR+TNR at : 2.917048 

Model containing climate variables 9, 10, 11, and 12.

class          : ModelEvaluation 
n presences    : 17623 
n absences     : 499 
AUC            : 0.6460043 
cor            : 0.1236925 
max TPR+TNR at : 3.253467 

Model containing climate variables 13, 14, 15, and 16.

class          : ModelEvaluation 
n presences    : 17623 
n absences     : 499 
AUC            : 0.7276885 
cor            : 0.1435838 
max TPR+TNR at : 3.933495 

Model containing climate variables 17, 18, and 19.

class          : ModelEvaluation 
n presences    : 17623 
n absences     : 499 
AUC            : 0.7275213 
cor            : 0.1341399 
max TPR+TNR at : 3.860463 

Figure 8: Tables showing the AUC and cor values of each model for species 2.

The results of the evaluation of AUC and cor values are consistent with the AIC values of the models, with model 1 showing the highest AUC and cor values. However the correlation of model 1 was around 0.3. Suggesting that even though it is the best model it is still quite poor at predicting Porcupine distribution. This strengthens the argument that climate is not the main factor determining the range of the North American Porcupine.

However as this is the best model created, in this analysis, to explain Porcupine distribution this is the model used to make predictions about their current distribution.

Mapping current distribution:

Figure 9: Map showing predicted probabilites of species 2 occurrence based on predictions from model 1.

Figure 10: Map showing current predicted areas of Presence and Absence for species 2 based on model 1.

In general the observed data from the GBIF dataset matches with areas predicted by the model to have a high chance of occurence. However the model slightly under estimates the latitudinal range of the porcupine with several presence coordinates falling both above and below the main region of predicted occupation. The binary presence absence map does not include the majority of the large cluster of observations found in central and western America. This model of current distribution actually significantly underestimates the range of Porcupines, missing out the majority of occurences recorded in Western USA. Similarly to the Fisher data it shows two completely separate populations despite the real life observations suggesting otherwise. According to both maps we would expect the climate conditions in south-eastern America to support porcupine habitation. However very few based on the GBIF data there are very few porcupines here potentially this area is lacking other important features such as corrected habitat type or an important food plant.

Overall the current distributions of Fishers and Porcupines are fairly similar, therefore we would expect a high degree of overlap.

4 Distribution Overlap

4.1 Plotting

Although from visual observations the maps of current distribution for both species it is clear they have a large degree of overlap it is still useful to plot maps of the overlap to confirm this. To create Fig 11 the results of the predictions for current distribution of each species were combined into one dataset. In the original prediction models the chance the the species will occur in an area is ranked on a scale of 0-1 with higher numbers indicating a higher chance the species will be found there. By combining these datasets in a summative way it creates a dataframe where values of 0 indicate neither species is likely to be present, values around 1 suggest just one species is present and values approaching 2 indicate areas where both species are likely to occur.

Figure 11: Map showing the probability that between 0 and 2 species occur in a certain area, yellow represents high likelihood of both species occurring and therefore areas of overlap

The individual thresholds for presence of species 1 and 2 were added together to create a threshold above which you would expect both species to occur, therefore allowing us to predict areas of overlapping distribution.

Figure 12: Map showing areas of sympatry between species 1 and 2

The majority of sympatric areas predicted by the models occur in the USA, particularly the Eastern side of it. There is also a small amount of overlap predicted reaching up along the south eastern and south western coast of Canada.

4.2 Metrics

The degree of overlap can be calculated by finding the sum of predicted localities at which sympatry occurs, based on the threshold created to make Fig 12. This threshold is then used to extract localities from a dataset of all coordinates of both species at which overlap is likely. This is then expressed as a proportion of the total coordinates for both species. This allows the metric to be compared across different species pairs without being affected by the number of coordinates available for each species. Therefore the final metric of degree of overlap is expressed as the proportion of total coordinates for each species which are above the 2 species occurence threshold.

Calculation is shown in the code below.

Degree of Overlap Calculation
overlap_localities <- extract(combined >= (species2_tr+species1_tr), combined_coords) #Extracts which localities within the combined coordinates of both species

overlap_metric <- (sum(overlap_localities == 1, na.rm = TRUE))/length(combined_coords$lon) #calculates the total sum of localities and divides them by the total number of coordinates for both species. 
[1] "Degree of Overlap = 0.591194163573531"

Just under 60% of localities for Fishers and North American Porcupines are expected to overlap. This suggests there might be a degree of interspecies distribution dependence, whereby one species is more likely to occur when the other species is also present. This possibility will be explored in the next section.

5 Modelling Interspecies Distribution Dependence

To test whether there is interspecies distribution dependence of species 1 on species 2 firstly you have to calculate whether species 2 occurs at the presence and absence coordinates used for species 1, which was done using the code below.

Calculating presence and absence of species 2 and species 1 coordinates
#extracts the glm predicted chance of occurence for species 2 at each of the coordinates used to model species 1 presence and absence.

species2_occurence_at_species1<- extract(species2_pred, species1_combined_coords)
colnames(species2_occurence_at_species1) <- c("ID", "pa") 

# Converts the chances of occurrence into binary values with 0 showing absence and 1 showing presence based on the threshold for presence species2_tr calculated earlier.

species2_pa_at_species1<- data.frame(ifelse(species2_occurence_at_species1$pa > species2_tr, (paste(1)), (paste(0))))  
colnames(species2_pa_at_species1) <- "species2_pa" 

As shown in the code above the species 2 occurrence values were extracted from predictions made based on a model which included climate variables. Specifically variabesl 1, 2, 3, and 4 as model 1 was used to make the current distribution predictions for species 2. Therefore even when just creating a model for the regression of species 1 presence-absence against extracted species 2 presence-absence climate values are indirectly involved as they contributed to whether species 2 was considered present or absent at each location. However models to test whether climate variables have an interaction on the ability of species 2 to predict the presence of species 1 were also created.

6 Models were created, a null (reduced) model, a simple linear model looking at the correlation between species 1 and 2 distributions, and four models considering the effect of various climate variables in addition to species 2 on species 1. Two different sets of climate variables were chosen and for each set a no interaction and a with interaction (full) model were created. The sets of climate variables used were variables 1, 2, 3, and 4, and variables 13, 14, 15, and 16. These were chosen as they are the variable groups known to be most important in determining the distributions of species 2 and 1 respectively. Therefore they are the most likely to impact the relationship between species 1 and 2.

To see if there was a significant difference in the ability of models to fit the distribution of species chi-squared tests were performed. A chi-squared was used instead of an ANOVA as species distributions are binary data and therefore binomial glm’s were used. Chi-squared is considered more appropriate to use on binomial models than an ANOVA.

                      df      AIC
null_model             1 3496.903
no_climate_model       2 3410.293
no_interaction_model1  6 2927.049
interaction_model1    32 1718.044
no_interaction_model2  6 2640.887
interaction_model2    32 1718.044

Based on preliminary analysis of the AIC values it appears that only including species 2 has very little effect on the fit of the model. It is only when climate variables are considered, whether as interactions or not, that the fit begins to drastically improve. This could suggest that climate has a stronger effect on fisher distribution than porcupines.

Analysis of Deviance Table

Model 1: pa ~ 1
Model 2: pa ~ species2_pa
  Resid. Df Resid. Dev Df Deviance  Pr(>Chi)    
1      6311     3494.9                          
2      6310     3406.3  1    88.61 < 2.2e-16 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Analysis of Deviance Table

Model 1: pa ~ species2_pa
Model 2: pa ~ species2_pa + bio13 + bio14 + bio15 + bio16
  Resid. Df Resid. Dev Df Deviance  Pr(>Chi)    
1      6310     3406.3                          
2      6306     2628.9  4   777.41 < 2.2e-16 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Analysis of Deviance Table

Model 1: pa ~ species2_pa + bio1 + bio2 + bio3 + bio4
Model 2: pa ~ species2_pa * bio1 * bio2 * bio3 * bio4
  Resid. Df Resid. Dev Df Deviance  Pr(>Chi)    
1      6306     2915.1                          
2      6280     1654.0 26     1261 < 2.2e-16 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Analysis of Deviance Table

Model 1: pa ~ species2_pa
Model 2: pa ~ species2_pa + bio13 + bio14 + bio15 + bio16
  Resid. Df Resid. Dev Df Deviance  Pr(>Chi)    
1      6310     3406.3                          
2      6306     2628.9  4   777.41 < 2.2e-16 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Analysis of Deviance Table

Model 1: pa ~ species2_pa + bio13 + bio14 + bio15 + bio16
Model 2: pa ~ species2_pa * bio1 * bio2 * bio3 * bio4
  Resid. Df Resid. Dev Df Deviance  Pr(>Chi)    
1      6306     2628.9                          
2      6280     1654.0 26   974.84 < 2.2e-16 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Figure 14: Tables showing the results of Chi-squared tests between pairs of models for the effect of species 2 on species 1

The statistical analyses above show that Porcupine distribution does have an affect on Fishers as the species only model did perform significantly better when compared to the null model. However both of the no-interaction climate models were significantly better at explaining Fisher distribution than the model just containing species suggesting that climate has a much larger affect on them than Porcupines. However adding climate variables as interactions also improved the fit of the model to fisher distribution, suggesting that porcupines can affect the distribution of Fishers but it is heavily mediated by climate conditions.

As these results seemed to suggest that climate is the main factor influencing fisher distribution I decided to compare whether removing species from no-interaction models actually had any significant effects. Model 4 from figure 1 is the same as the no interaction model for the second set of climate variables therefore these models were compared.

Analysis of Deviance Table

Model 1: pa ~ bio13 + bio14 + bio15 + bio16
Model 2: pa ~ species2_pa + bio13 + bio14 + bio15 + bio16
  Resid. Df Resid. Dev Df Deviance Pr(>Chi)  
1      6307     2634.3                       
2      6306     2628.9  1   5.4362  0.01972 *
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Figure 15: Results of Chi-Squared test of comparing model using just climate variables with one using species 2 and climate.

Although small there was a significant effect of removing Porcupines as a variable. This indicates that there is a small amount of interspecies dependence even when climate variables are also being considered. Due to the high degree of overlap between the species calculated earlier this likely represents weak positive correlation between the two species.This matches with the knowledge that in many areas Porcupines are the main food source of Fishers (Earle & Kramm, 1982). However as climate is by far the more influential variable does not necessarily suggest that fishers have a dependence on porcupines to inhabit a region. It is likely this could instead reflect overlap in preferred conditions between the species.

6 Future Distribution Prediction

To predict the future distribution of species future climate projections are required to provide data to run the previously created models on. The projection data used here was from CMIP5 (Coupled Model Intercomparison Project Phase 5), which is available from https://aims2.llnl.gov/search.

6.1 Species 1

Future climate predictions are based off model 4 as this was the best at predicting current distribution for fishers. Future predictions were calculated by running a prediction based of the original model 4 but using the new projection climate data as opposed to the current climate data.

Maps:

Figure 15: Maps comparing the Present GLM Occurrences and Distributions of Species 1 with those based on Climate Projections for 2060-2081

There is very little, if any change in the future distribution of fishers linked to climate change based on these models. There is a slight loss of connectivity within the western population. On the other hand the Eastern population shows a slight range expansion northwards into Quebec. Overall fishers are predicted by these models to be able to retain most of their current distribution, with potential for a small range expansion northwards in certain areas.

Range Change Metrics:

[1] "Current suitable localities = 5128"
[1] "Future suitable localities = 5050"
[1] "Number of new suitable localities (expansion) = 85"
[1] "Number of lost suitable localities (contraction) = 93"

As predicted from looking at the maps of distribution changes there is very little difference in the current and future predicted range of fishers. While he species undergoes both range expansions and contractions, the net change is range contraction, although only by a small amount.

6.2 Species 2

Future climate predictions are based off model 1 as this was the best at predicting current distribution for Porcupines. Future predictions were calculated in the same way as for species 1.

Maps:

Figure 16: Maps comparing the present GLM Occurrences and Distributions of Species 2 with those based on Climate Projections for 2060-2081

The range of North American Porcupines is expected in increase significantly, with large expansions both North and South of the current range. Based on climate data alone is appears that climate change could allow porcupines to colonise all of the United States as well as most regions of Canada except the centre. However this increase in range would include a shift in habitat from grasslands and temperate forest towards boreal forests and tundra in the North and tropical forests in the South (NABCI, 2016). Therefore in reality if climate changes are not accompanied by an expansion of the preferred habitat of Porcupines then they might not be able to fully track the climate changes, limiting the observed range expansion. Although there is some evidence that in the Pacific North West Porcupines are beginning to shift their preferred habit away from forests towards desser scrub and grassland habitats (Appel et. al., 2021). This is the main habitat type of the central region of the USA into which this model predicts they will expand into. Therefore this movement may already be starting to happen. However regardless of whether Porcupines are able to track the expansion of their favoured climate niche similar to the Fishers according to these models they would at the very least be expected to retain their current range under climate change conditions.

Range Change Metrics:

[1] "Current suitable localities = 9336"
[1] "Future suitable localities = 16551"
[1] "Number of new suitable localities (expansion = 7971"
[1] "Number of lost suitable localities (contraction) = 0"

As shown by the maps in fig 16 there is a large increase in predicted distribution of the North American Porcupine,

6.3 Degree of Overlap

As the range of each individual species shifts the degree of overlap between the two species is also expected to change.

Figure 17: Maps comparing the present GLM Occurrence overlap and Distributions of Overlap with those based on Climate Projections for 2060-2081

The maps above show that the overlap in distribution between Fishers and North American Porcupines is expected to increase in the future. In particular sympatric area’s in the future are expected to expand up the East coast of Canada, where currently they are mostly confined to the US. There are also expected to gain some new areas of overlap in the mid-western US. This increase in overlap is probably mostly due to the expansion of porcupines northward into Canada and their predicted future colonisation of the entire US as fishers have very little change in distribution between the present and future conditions. The GLM Occurence models as show that as well as the area in which they overlap increasing the general probability that both species co-occur across the region also increases.

[1] "Future Degree of Overlap = 0.659328469644609"

In the future the degree of overlap is predicted to be around 66% which is large than the metric calculated for current distributions. This matches with the maps of how areas of sympatry will change seen in fig 17, which also predict an increase in the number of locations where the two species overlap.

7 Bibliography

Appel, C. L., Moriarty, K. M., Matthews, S. M., Green, D. S., King, S. A., Yaeger, J. S., . . . Bean, W. T. (2021). NORTH AMERICAN PORCUPINE DISTRIBUTION IN THE PACIFIC NORTHWEST AND EVALUATION OF A NON-INVASIVE MONITORING TECHNIQUE. Northwestern Naturalist, 102(1), 9-29. https://doi.org/10.1898/1051-1733-102.1.9

Earle, R. D., & Kramm, K. R. (1982). Correlation Between Fisher and Porcupine Abundance in Upper Michigan. The American Midland Naturalist, 107(2), 244-249. https://doi.org/10.2307/2425375

Kordosky, J., Gese, E., Thompson, C. M., Terletzky, P., Purcell, K., & Schneiderman, J. (2021). Landscape use by fishers (Pekania pennanti): core areas differ in habitat than the entire home range. Canadian Journal of Zoology, 99(4), 289-297. http://dx.doi.org/10.1139/cjz-2020-0073

NABCI. (2016). State of North America’s Birds 2016. Ottowa, Ontario: Environment and Climate Change Canada. <www.stateofthebirds.org>

Pokallus, J. W., & Pauli, J. N. (2016). Predation shapes the movement of a well-defended species, the North American porcupine, even when nutritionally stressed. Behavioural Ecology, 27(2), 470-475. https://doi.org/10.1093/beheco/arv176